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Abstract 

Thyroglobulin is a large protein present in all vertebrates. It is synthesized in the thyrocytes and exported to lumen 
of the thyroid follicle, where its tyrosine residues are iodinated . The iodinated thyroglobulin is reintegrated into the 
cell and processed (cleaved to free its two extremities) for thyroid hormone synthesis. Thyroglobulin sequence 
analysis has identified four regions of the molecule: Tgl, Tg2, Tg3 and ChEL. Structural abnormalities and mutations 
result in different pathological consequences, depending on the thyroglobulin region affected. We carried out a 
bioinformatic analysis of thyroglobulin, determining the origin and the function of each region. Our results suggest 
that the Tgl region acts as a binding protein on the apical membrane, the Tg2 region is involved in protein 
adhesion and the Tg3 region is involved in determining the three-dimensional structure of the protein. The ChEL 
domain is involved in thyroglobulin transport, dimerization and adhesion. The presence of repetitive domains in the 
Tgl, Tg2 and Tg3 regions suggests that these domains may have arisen through duplication. 
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Introduction 

Thyroglobulin is the precursor of the thyroid hormones 
triiodothyronine (T3) and thyroxine (T4). In humans, 
thyroglobulin is synthesized by thyroid follicle cells, which 
are also known as thyrocytes [1]. Thyroglobulin molecules 
form dimers, which are exported to the lumen of the thy- 
roid follicles [2]. There, the thyroglobulin is immobilized 
on the apical membrane. The thyroid hormones process 
starts by the iodination of tyrosine residues. Thyroperoxi- 
daseis activated by H2O2, leading to the oxidation of iodide, 
followed by the iodination and conjugation of some of 
the tyrosine residues present in the thyroglobulin molecule. 
The iodinated and conjugated thyroglobulin is then re- 
turned to the cell via an endocytosis process that may 
involve histone HI [3], megalin (gp330) [4] and/or the N- 
acetylglucosamine receptor [5]. Only a very small number 
of iodinated tyrosine residues are involved in thyroid hor- 
mone synthesis. T4 is formed by the conjugation of two 
residues of diiodotyrosine followed by cleavage. T3 is 
formed in a similar manner, but through the conjugation of 
diiodotyrosine with monoiodotyrosine [6,7]. T3 is the func- 
tional form; it is generated principally by T4 deiodinases in 
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the peripheral organs, with only 13% being formed in the 
thyroid gland [8]. Thyroid hormones reach their target 
organs via the bloodstream. Thyroglobulin has been re- 
ported to regulate some thyroid genes and the growth of 
epithelial cells [9,10]. It acts as both a hormone and an iod- 
ine reservoir [11]. 

In humans, mice and fish, thyroid hormone levels deter- 
mine the basal rate of metabolism and overall energy 
expenditure [12-14]. In other species, such as Senegalese 
sole [15], amphibians [16], urochordatas [17], amphioxus 
[18] and lamprey [19], thyroid hormones play a critical 
role in the metamorphosis from larvae to juveniles. Thyro- 
globulin protein structure has been studied in detail 
[20-22]. This protein is present in all vertebrates and 
always has the same structure, consisting of four regions: 
the Tgl (- 10 repetitive domains), Tg2 (3 repetitive 
domains), Tg3 (5 repetitive domains) and ChEL regions 
(Figure 1-a and 1-b). The Tgl, Tg2 and Tg3 regions (mov- 
ing along the molecule from its N-terminal end) consist of 
repetitive domains. All three regions are rich in cysteine 
residues, allowing them to form disulfide bonds [23]. The 
presence of these repetitive domains suggests their pos- 
sible evolution through the duplication of source domains. 
The C-terminus of the molecule includes a 581 -amino 
acid sequence displaying a high degree of similarity to the 
sequence of acetylcholinesterase (28% identity) [24,25]. 
One previous study identified the ChEL domain as the 
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Figure 1 The structure of the thyroglobulin protein, a) Structure of the human thyroglobulin protein, b) Structure of the zebrafish 
thyroglobulin protein, c) Structure of the amphioxus thyroglobulin-lil<e protein, d) Structure of the sea urchin thyroglobulin-lil<e protein. Blue: Tgl 
domains, red: Tg2 domains, green: TgS domains, black: the ChEL domain. Magenta: the location of the thyroid hormone synthesis sites on the 
proteins (tyrosine residues). 



origininal source of thyroglobulin [26]. Thyroglobulin con- 
tains about 140 tyrosine residues, but only about 30 of 
these residues are iodinated and a very small number of 
these iodinated tyrosines undergo conjugation to form T3 
and T4 [27]. Only four major thyroid hormone synthesis 
sites have been clearly identified in the human thyroglobu- 
lin molecule and these sites are located at either end of the 
protein: TyrS, Tyr2554, Tyr2568 and Tyr2747 [21]. 

Thyroglobulin may thus be seen as a huge precursor of 
two very small products. Additional studies of its other, as 
yet unexplored functions in the cell may be useful. For ex- 
ample, this protein could potentially be involved in the 
trafficking of iodinefrom the thyrocyte to the follicle 
lumen and its storage. Many studies have made use of bio- 
informatics tools to analyze the evolution of proteins and 
genes, and such tools may be useful in this context [28,29]. 



We performed a phylogenetic analysis of the thyroglobu- 
lin molecule with the sequenced genomes of species corre- 
sponding to key steps in animal evolution. Our results 
provide clues to the evolution of thyroglobulin and poten- 
tial functional roles for theTgl, Tg2, Tg3 and the ChEL 
regions. 

Materials and methods 

Sequence extraction 

We extracted the available DNA and protein sequences for 
thyroglobulin (Tg) from the NCBI databank www.ncbi.nlm. 
nih.gov for four species: human [GenBank:CAA29104],rat 
[GenBank:AAF34909], mouse [GenBank:AAB53204], pig 
[GenBank:ACY66900]. We also extracted six predicted 
sequences: cattle [GenBank:NP_ 776308] horse [GenBank: 
XP 001916622] marmoset [GenBank:XP 002759270], 
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panda [Gen-Bank:XP 002917659], zebrafish [GenBank:XP 
694292] and zebra finch [GenBank:XP 002188056]. For 
other species, for which the amino-acid sequence of is 
unknown, such as opossum, and fugu, we used the human 
thyroglobulin genomic sequence in Blast searches of the 
UCSC website genome.ucsc.edu; We first translated the 
DNA sequence to obtain a putative amino-acid sequence. 
We then used Blast to check whether the predicted 
sequence was present in the database (chr3:41 1,623,333- 
412,004,486 and chrUn:270,007,53 1-270,025,053 for opos- 
sum and fugu, respectively). Homologous sequences from 
amphioxus [GenBank:XP 002607132] and sea urchin [Gen- 
BankiXP 001202473] were also identified by BLAT analysis 
(chrUn:353,044,426-353,083,914 and Scaffold82420:233-1, 
088 in the amphioxus and sea urchin genomes, 
respectively). 

Sequence similarity 

We searched for regions presenting sequence similarities to 
the constituent domains of thyroglobulin - Tgl Tg2, Tg3 
and ChEL - with the Blastall command ftp://ftp.ncbi.nlm. 
nih.gov/blast/db/, version 2.2.19. Pairs of sequences were 
compared on the basis of their global alignment with 
the Myers & Millers algorithm manpages.ubuntu.com/ 
manpages/karmic/manl. Results were generated in a separ- 
ate text file containing alignment diagrams, scores, degrees 
of identity, similarity and gaps. We used ClustalX software 
ftp://ftpigbmc.u-strasbg.fr/pub/ClustalX/. for analysis of 
multiple alignments of three or more sequences. The 
results were output to a separate text file, but without in- 
formation about score, because it was not possible to use 
more than two sequences for score calculation with Com- 
positional Matrix Adjust. 

Phylogeny 

We used the neighbor-joining (NJ) method in PHYLIP 
[30] and mega 5 [31] for phylogenetic analysis. A range of 
analyses, from simple p distance to multiparameter models 
with gamma correction, were used. The significance of the 
phylogenetic tree was assessed by bootstrapping, with 
10,000 iterations. The Jones-Taylor- Thornton (JTT) 
model of amino-acid sequence evolution, with gamma cor- 
rection, was used for distance estimation [32]. In each 
case, the distance was validated with 10,000 bootstrap 
replications. 

Results 

The N-terminal Tgl region 

In humans, the first region of thyroglobulin consists of 10 
Tgl repeat domains, each containing 50 amino acids and 
displaying 14% identity. However, Molina, et al. identified 
an 11^^ domain located after the Tg2 region [33]. A com- 
parison of this region in all the thyroglobulin protein 
sequences extracted (13 species) indicated that the fish 



thyroglobulins (zebrafish and fugu) lacked Tgl -7 and Tgl -9 
(Additional file 1: Figure SI). We used mega 5 software to 
calculate the distance of the whole thyroglobulin protein 
sequences and of each of the component regions 
(Additional file 2: Tables S2, Additional file 3: Table S3, 
Additional file 4: Table S4 and Additional file 5:Table S5). 
We performed a phylogenetic analysis on the thyroglobulin 
Tgl domains of four vertebrate species - (human (10 Tgl 
domains), mouse (10 Tgl domains), zebra finch (10 Tgl 
domains) and zebrafish (8 Tgl domains)) - six Tgl domains 
from amphioxus (a cephalochordate) and two Tgl domains 
from sea urchin (an echinoderm) (Figure 2). The sixth 
amphioxus Tgl domain clusteredwith the second sea 
urchin domain in the phylogenetic tree. With a lower boot- 
strap percentage, we observed two big major branches of 
the phylogenetic tree, the first corresponding to the sea 
urchin and amphioxus Tgl domains, which clustered with 
the thyroglobulin Tgl-8, Tgl-2, Tgl-1 and Tgl-10 domains, 
and the second corresponding to the Tgl -3, Tgl -4, Tgl -7, 
Tgl-5, Tgl-9 and Tgl-6 domains. For confirmation of these 
results, we performed a phylogenetic analysis on the thyro- 
globulin Tgl domains of 13 vertebrate species (human 
(10 Tgl domains), marmoset (10 Tgl domains), pig (10 
Tgl domains), horse (10 Tgl domains), dog (10 Tgl 
domains), panda (10 Tgl domains), rat (10 Tgl domains), 
mouse (10 Tgl domains), cow (10 Tgl domains), opossum 
(10 Tgl domains), zebra finch (10 Tgl domains), zebrafish 
(8 Tgl domains) and fugu (8 Tgl domains)) together with 
six Tgl domains from amphioxus and two from sea urchin. 
This new tree also had two major branches (Additional file 
6: Figure S2). The fish Tgl-10 domains did not cluster with 
the other Tgl-10 domains in either of the trees. We also 
investigated the genome of the urochordate Ciona intesti- 
nalis. The protein with the largest number of Tgl motifs 
was a predicted protein (rather than one for which the 
amino-acid sequence was actually known similar to 
entractin/nidogen (XP_ 002125504.1) and containing three 
Tgl motifs. We generated two phylogenetic trees, one 
based on 13 vertebrate Tgl regions (human, marmoset, 
pig, horse, dog, panda, rat, mouse, cow, opossum, zebra 
finch, zebrafish and fugu) (Figure 3) and the second based 
on 13 vertebrate thyroglobulin proteins (human, marmo- 
set, pig, horse, dog, panda, rat, mouse, cow, opossum, 
zebra finch, zebrafish and fugu) together with the 
sequences of Thyroglobulin homologs from Ciona intesti- 
nalis, amphioxus and sea urchin (Figure 4). 

We investigated the function of the Tgl region of thyro- 
globulin, by investigating proteins containing domains simi- 
lar to the Tgl domain with cutoff e-value = 0.15 as 
recommended by the software. For the Tgl-1 domain, 107 
proteins were selected (15 thyroglobulins, 15 nidogens, 18 
testicans, 16 secreted proteins, acidic, cysteine-rich (SPARC) 
proteins, 13 invariant chains and 30 unnamed or hypothet- 
ical proteins). For the Tgl-2 domain, 49 proteins were 
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Figure 2 Phylogenetic analysis of thyroid hormone precursor Tgl domains for six species. Phylogenetic analysis of thyroglobulin Tgl 
domains from 4 species (human, mouse, zebra finch and zebrafish) and the amphioxus and sea urhin thyroglobulin-like Tgl domains. Evolutionary 
history was inferred by the neighbor-joining method. The bootstrap consensus tree inferred from 1000 replicates is taken to represent the 
evolutionary history of the taxa analyzed. The numbers at nodes representing bootstrap support scores. 
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Figure 3 Phylogenetic analysis of thyroglobulin Tgl regions from 13 species. Phylogenetic analysis of thyroglobulin Tgl regions from 13 
species (liuman, marmoset, rat, mouse, panda, dog, liorse, pig, cow, opossum, zebra fincli, zebrafisli and fugu). Evolutionary history was inferred 
by the neighbor-joining method. The bootstrap consensus tree inferred from 1000 replicates is taken to represent the evolutionary history of the 
taxa analyzed. The numbers at nodes representing bootstrap support scores. 



retained (15 thyroglobulins, 13 nidogens, 3 testicans, 
2SPARC proteins, 3 invariant chains and 13 unnamed or 
hypothetical proteins). For the Tgl-3 domain, 16 proteins 
were found (thyroglobulins only). For the Tgl -4 domain, 
we retained 97 proteins (16 thyroglobulins, 13 nidogens, 

10 testicans, 13 SPARC proteins, 4 invariant chains, 1 
insulin-like growth factor binding protein (IGFBP) and 40 
unnamed or hypothetical proteins). For the Tgl-5 domain, 
103 proteins were retained (18 thyroglobulins, 6 nidogens, 

11 testicans, 19 SPARC proteins, 3 invariant chains,2 
IGFBPs and 44 unnamed or hypothetical proteins). For the 
Tgl-6 domains, 49 proteins were identified (17 thyroglo- 
bulins, 14 nidogens, 3 invariant chains and 15 unnamed or 
hypothetical proteins). For the Tgl-7 domain, 30 proteins 
were retained (10 thyroglobulins, 11 nidogens and 9 
unnamed or hypothetical proteins). For the Tgl -8 domain, 
105 proteins were retained (17 thyroglobulins, 13 nidogens, 

9 testicans, 17 SPARC proteins, 5 invariant chains and 44 
unnamed or hypothetical proteins). For the Tgl -9 domain, 
11 proteins were found (thyroglobulins only). For the Tgl- 

10 domain, 100 proteins were retained (17 thyroglobulins, 
13 nidogens, 7 testicans, 15 SPARC proteins, 3 invariant 
chains and 45 unnamed or hypothetical proteins). The 
number of thyroglobulin proteins displaying sequence simi- 
larity to the human Tgl domains varied from 15 to 17, 
essentially due to the presence of incomplete thyroglobulin 
protein sequences in the databases we used, particularly for 
bears. The abovementioned proteinsdisplayed sequence 



similarities to the Tgl regions of proteins from five families 
[34]: testicans, SPARC-related modular calcium binding 
(SMOC) proteins, nidogens, IGFBPs and invariant chains. 
Testican proteins are involved in the regulation of cell 
attachment, cysteine protease and metalloprotease activities 
[35-38]. SMOC proteins are glycoproteins present princi- 
pally at the basement membrane and involved in the regu- 
lation of calcium binding [39,40]. SMOC and testican 
proteins are present in metazoans. Proteins of the nidogen 
family are known to control the three-dimensional struc- 
ture of the basal membrane [41]. Nidogen proteins arealso 
involved in cell attachment, neutrophil chemotaxis and 
nervous system development [42,43]. IGFBP belongs to a 
family of seven proteins with high affinity for IGF with dif- 
ferent functions in several tissues [44]. Nidogen and IGFBP 
are present in both tunicates and craniates. The invariant 
chain is involved in MHC-II cell formation [45]. This pro- 
tein, like the thyroglobulin protein, is present only in 
vertebrates. 

The Tg2 region 

The second region consists of three Tg2 repetitive domains 
of 15 amino acids each, presenting 24% identity. The 
phylogenetic analysis of this region was less robust than 
that of the Tgl region, due to the small size of these 
domains. However, we identified 33 proteins displaying 
sequence similarity to the Tg2 region. Nine were thyroglo- 
bulins: Bos taurus, Mus musculus, Rattus norvegicus. 
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Figure 4 The phylogenetic analysis of tiiyroglobulins from 13 species. Phylogenetic analysis of thyroglobulins from 13 species (liuman, 
marmoset, rat, mouse, panda, dog, liorse, pig, cow, opossum, zebra fincli, zebrafisli and fugu) and of tliyroglobulin-lil<e proteins from Ciona intestinalis, 
ampliioxus and sea urcinin. Evolutionary history was inferred by the neighbor-joining method. The bootstrap consensus tree inferred from 1000 
replicates is taken to represent the evolutionary history of the taxa analyzed. The numbers at nodes representing bootstrap support scores. 



Macaca mulatta, Canis lupus familiaris, Equus caballus, 
Sus scrofa, Taeniopygia guttata and Danio rerio. Eleven 
were signal peptide - CUB domain - EGF-like (SCUBE) 
proteins (SCUBE3: Canis familiaris, Mus musculus, Homo 
sapiens, Macaca mulatta, Sus scrofa and Danio rerio and/ 
or SCUBE 1: Homo sapiens, Canis familiaris, Rattus norve- 
gicus. Bos taurus and Danio rerio). The other 13 proteins 
were unnamed or hypothetical proteins. SCUBE proteins 
are known to involved in adhesion. Queries of the PFAM 
databas http://pfam.sanger.ac.uk identified a GCC2-GCC3 
domain conserved in the Tg2 region of mouse and human 
thyroglobulins. The GCC2-GCC3 domain is also present 
in the human SVEPl and mouse SCUB2 proteins. 

The Tg3 region 

In humans, the Tg3 region consists of five repetitive 
domains that can be classified into two subgroups: three 
domains in subgroup a (Tg3-al: 111 AA, Tg3-a2: 98 AA, 
Tg3-a3: 58 AA) and two domains in subgroup b (Tg3-bl: 
163 AA, Tg3-b2: 130 AA). TheseTg3 domains are 9% iden- 
tical (Figure 5-a and 5-b). A search for proteins displaying 
sequence similarity to Tg3 domains identified only thyro- 
globulin proteins. Interestingly, the best conservation of 



cysteine residues between domains was observed in 
humans, with perfect conservation (100%) for Tg3-a 
domains and very high levels of conservation (87%) for 
Tg3-b domains. Furthermore, the five amino acids per- 
fectly conserved in all Tg3 domains were cysteine residues 
(Figure 5-c). Cysteine residues account for 6% of all the 
amino acids present in the human Tg3 region and these 
residues were remarkably conserved in the thyroglobulin 
Tg3 regions of all the species studied; 100% of the 34 cyst- 
eine residues were perfectly conserved between the Tg3 
regions of 12 species (human, rat, panda, marmoset, 
mouse, horse, dog, cow, zebra finch, zebrafish and fugu). 
In the opossum, two of the 34 cysteine residues in the Tg3 
region were displaced (Additional file 1: Figure SI). 

Tg3 domains were found only in vertebrate thyroglo- 
bulins. We investigated the origin of the Tg3 region 
domains, by comparing the sequences of the zebrafish 
Tg3 domains with the amphioxus protein. We found a 
similar sequence in region 413-441 of the amphioxus 
protein. Phylogenetic analysis including this region with 
the human and zebrafish thyroglobulin Tg3 domains 
clustered the 413-441 region of the amphioxus protein 
with the Tg3-2b domain of the human and the zebrafish 
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Figure 5 Alignment of tlie Tg3 domains in liuman thyroglobulin. a) The alignment of Tg3-a domains in human thyroglobulin: 83% of the 
cysteine residues are conserved in Tg3-a1, Tg3-a2 and Tg3-a3; 100% of the cysteine residues are conserved between Tg3-al and Tg3-a2. 55% of 
the conserved amino acids in Tg3-a domains are cysteine residues, b) The alignment of Tg3-b domains in human thyroglobulin: 100% of the 
cysteine residues are conserved in Tg3-bl and Tg3-b2; 23% of the conserved amino acids in Tg3-b domains are cysteine residues, c) The 
alignment of all Tg3 domains in human thyroglobulin: 44% of the cysteine residues are conserved in all Tg3 domains; 100% of the conserved 
amino acids in Tg3 domains are cysteine residues. 
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thyroglobulins, albeit witha low bootstrap percentage 
(data not shown). 

The C-terminal ChEL domain 

The ChEL domain of human thyroglobulin consists of 581 
amino acids. This region displays a high level of similarity 
to acetylcholinesterase, hence its name. Acetylcholinester- 
ase catalyzes the degradation of acetylcholine in the regula- 
tion of neurotransmission [46]. Blastall analyses of the 
ChEL domain identified 992 proteins displaying sequence 
similarity to this domain: 30 thyroglobulin proteins, 598 
esterases (either carboxylesterases (n = 205) or cholines- 
terases (n = 150)) and 35 neuroligins. Cholinesterase-like 
regions have previously been identified in both enzymes 
and structural proteins [25]. When present in structural 
proteins, this region is thought to be related to cell move- 
ment, as a first sign of cell differentiation [47]. The function 
of the ChEL domain in thyroglobulin was recently linked 
to its transport throughout the endoplasmic reticulum [48]. 
Furthermore, ChEL-truncated thyroglobulin has been 
shown to be unable to form homodimers [49]. 

Thyroid hormone synthesis sites 

We determined the number of thyroid hormone synthesis 
sites in the thyroglobulin proteins studied here. The 
human thyroglobulin protein contains four major thyroid 
hormone synthesis sites [21]; An alignment of thyroglobu- 
lin sequences showed that the zebra finch, zebrafish and 
fugu proteins contained only three of the human thyroid 
hormone synthesis sites (Additional file 1: Figure SI). The 
first site (Tyr5) is the main site of hormone synthesis 
(more than 50%) [50] and was found to be present in all 
the thyroglobulin proteins studied. In amphioxus, the tyro- 
sine residue in this position was replaced by a phenylalan- 
ine residue. Sequence alignment data showed that only the 
third site was present in amphioxus and that the Ciona 
intestinalis protein contained no thyroid hormone synthe- 
sis sites. 

Discussion 

The Tgl region of thyroglobulin may be involved in 
binding 

In vertebrates, iodination of the tyrosine residues of thyro- 
globulin requires the protein to be present in the lumen of 
the thyroid follicle. The iodinated thyroglobulin is then 
returned to the cell via a process called pinocytosis, which 
involves histone HI [3], megalin (gp330) [4] and/or the N- 
acetylglucosamine receptor [5]. Our study of the thyro- 
globulin Tgl region showed this region to be structurally 
related to proteins with binding functions from five fam- 
ilies. Novinec et al [34] also described another protein with 
sequence similarity to the Tgl domain, trophinin. This 
membrane protein has been shown to mediate the adhe- 
sion of homophilic cells [51]. We think the Tgl region 



may mediate the binding of thyroglobulin to the thyrocyte 
apical membrane. The region of the HI histone binding to 
thyroglobulin remains unidentified, whereas two regions 
of the N-acetylglucosamine receptor have been reported 
to bind thyroglobulin: RHL-1 subunit (N1-A500) [5] and 
(S789-M1,172) [52]. These receptors bind to the N- 
terminal end (Tgl region) of the thyroglobulin protein. By 
contrast, megalin has been shown to interact with the 
carboxy-terminal domain of thyroglobulin, at R2,489- 
E2,503 [53], although the authors of this study were them- 
selves critical of this work [54]. They reported that the 
region of interaction was poorly conserved between 
human and rat thyroglobulins and their finding that a 
rabbit antibody raised against R2,489-E2,503 reduced 
heparin-binding to rat Tg by only 70% led them to con- 
clude that other heparin-binding sites must be involved in 
binding. These data, including the similarity of the Tgl 
region to the extracellular matrix proteins nidogen and 
testican, provide support for our hypothesis that the Tgl 
region is involved in the attachment and endocytosis of 
thyroglobulin. 

Phylogeny of the Tgl region 

The function of thyroglobulin seems to depend strongly on 
the follicle structure of the thyroid. This follicular structure 
is observed only in vertebrates. Nonetheless, although it 
remains unclear whether a colloid is present in the endo- 
style of the invertebrates of the chordate group, such as 
cephalochordates and urochordates, the endostyle is widely 
considered to be homologous to the follicle of the verte- 
brate thyroid gland [55]. This is not consistent with the 
detection of a thyroglobulin protein in Eisenia fetida by 
Wilhelm [56]. In annelids, hormones are produced exclu- 
sively by the central nervous system. No sequence that 
could be unambiguously identified as corresponding to a 
thyroglobulin was found in the amphioxus genome [57], 
but a large protein (about 2,400 amino acids) with bio- 
chemical properties similar to those of thyroglobulin has 
been described in this organism [58]. Both T3 and T4 have 
also been described in this cephalochordate [59]. The 
2,400-amino acid thyroglobulin-like protein of this species 
contains six domains displaying sequence similarity to the 
Tgl region but not to the Tg2, Tg3 and ChEL domains 
(Figure 1-c). 

Another smaller protein of about 137 amino acids that 
clusters with vertebrate thyroglobulin in phylogenetic 
analysis was identified in sea urchin (Figure 4). This pro- 
tein contains two Tgl domains but has no Tg2, Tg3 or 
ChEL domains (Figure 1-d). Phylogenetic analysis of a 
large number of sequences [60] classified the urochor- 
dates as more closely related to vertebrates than the 
cephalochordates (amphioxus) and echinoderms (sea ur- 
chin). On the basis of these data, we looked for a protein 
homologous to thyroglobulin in urochordates. Patricolo 
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et al demonstrated the presence of thyroid hormones 
and their involvement in metamorphosis in ascidian lar- 
vae from the Urochordata [17]. However, the genome of 
another urochordate, Ciona intestinalis, was found to 
contain no sequence homologous to thyroglobulin des- 
pite the presence of thyroid hormones. These data sug- 
gest that ascidians use other precursor proteins for 
iodotyrosine synthesis [61]. Together, these data suggest 
that the origins of the thyroglobulin protein lie in the 
Echinodermata. 

We investigated the origin of the Tgl region domains, 
by studying the phylogeny of the Tgl domains in an ana- 
lysis including the sea urchin protein (Echinodermata), 
the amphioxus protein (Cephalochordata), and the zeb- 
rafish (Teleostei), zebra finch (Aves) and human thyro- 
globulins. Our results suggest that the second Tgl 
domain of the sea urchin protein is the ancestor of the 
sixth Tgl domains of the amphioxus protein, while Tgl 
domains 1, 2, 3, 4 and 5 of the amphioxus protein prob- 
ably resulting from the duplication of domain 6. The 
phylogenetic analysis suggested that the Tgl-1, Tgl-2, 
Tgl-8 and Tgl-10 domains ofthyroglobulin were derived 
directly from the Tgl domains of the amphioxus protein 
(Figure 2). The separation of thyroglobulin domains into 
two major branches may indicate two different origins of 
thyroglobulin Tgl domains. The thyroglobulin Tgl 
domains clustering with the amphioxus protein Tgl 
domains are located at the end of the Tgl region. We 
suggest that the thyroglobulin Tgl domainsduplicated 
from the two ends to the center of the Tgl region. The 
number of Tgl domains presence increases with the 
number of evolutionary steps, suggesting that the evolu- 
tion of thyroglobulin function may be dependent on 
number of Tgl domains. However, the branching of the 
tree for Tgl domains has only weak bootstrap support. 
(Figure 2 and Additional file 6: Figure S2), probably due 
to the length of time over which evolution has been oc- 
curring. Each Tgl domain is free to evolve by itself, but 
the overall structure of the Tgl region is conserved 
(Figure 3). 

Involvement of the Tg2 region in cell adhesion 

The presence of the Tg2 region in the SCUBE protein of 
many species suggests that these proteins may have a 
common function. SCUBE is a protein found in many 
embryonic tissues [62]. In zebrafish, mutations in the 
SCUBE2 gene are associated principally with develop- 
mental deficits [63]. A recent study showed that SCUBE 1 
was an adhesive molecule mediating platelet- matrix 
interaction and ristocetin-induced platelet agglutination 
[64]. On the basis of its secretory nature, SCUBE3 is 
thought to function locally or at distance, in a paracrine 
or endocrine fashion [65]. However, the exact functions 
of SCUBE3 remain elusive. On the basis of these and 



published results, we suggest the Tg2 region isinvolved in 
thyroglobulin-mediated cell adhesion. The conservation of 
the GCC2-GCC3 domain in the Tg2 region highlights the 
structural conservation of this region. The function of the 
GCC2-GCC3 domain remains unknown, but this domain 
is present in the human SVEPl protein. The functional 
annotation of this protein indicates a role in cell adhesion. 
This is potentially consistent with our hypothesis that the 
Tg2 region is involved in cell adhesion. 

The Tg3 region may have a structural function 

Cysteine is important for the correct three-dimensional 
structure of a protein, through its role in the formation 
of disulfide bonds. Misfolded proteins are recognized as 
abnormal and disposed of by a non lysosomal proteolytic 
pathway. Hishinuma et al [66] showed that replacement 
of the cysteine residues of (C1236R) (C1995S) thyro- 
globulin prevent the protein from forming the disulfide 
bonds required for thyroglobulin monomer production. 
As a result, intracellular transport is blocked and both 
these mutated thyroglobulins are retained in the endo- 
plasmic reticulum. The high degree of cysteine residue 
conservation in Tg3 domains and in Tg3 regions from 
the 13 species used to generate the phyogenetic tree, 
from Actinopterygii to humans, highlights the import- 
ance of correct disulfide bond formation to the the ter- 
tiary structure of thyroglobulin. In a recent study, 
Targovenik et al [67] reviewed the cysteine mutations in 
thyroglobulin andshowed that more than half these 
mutations (55%) occurred in the Tg3 region. They also 
reported changes to the three-dimensional structure of 
thyroglobulin in the presence of cysteine mutations in 
the Tg3 region. The presence of Tg3 regions only in 
thyroglobulin proteins may be explained by a structural 
function, the disulfide bonds being essential to the three- 
dimensional structure of the molecule. The region of 
homology highlighted here between the zebrafish Tg3 
region and the amphioxus protein suggests that this best 
conserved region between Tg3 domains may be the ori- 
gin of these domains. There are two Tg3 subgroups, a 
and b. We therefore suggest that the original sequence 
duplicated twice initially, to generate the Tg3-a and Tg3- 
b domains. The Tg3-a domain duplicated three times, 
generating Tg3-al, Tg3-a2 and Tg3-a3, and the Tg3-b do- 
main duplicated twice, giving rise to Tg3-bl and Tg3-b2. 

The ChEL domain is involved in protein transport 

The two studies mentioned above [48,49] demonstrated 
that a role for the ChEL domain in the dimerization and 
transport of thyroglobulin. Kim et al [68] indicated that 
mutations affecting the ChEL domain of mouse thyro- 
globulin resulted in the synthesis of a full-length thyro- 
globulin that folded abnormally, preventing its transport 
to the Golgi complex. However, the ChEL domain is 
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present in structural proteins, as described by Krejci et al 
[25]. We demonstrated the similarity of this domain be- 
tween thyroglobulin, esterase and neuroligin proteins, 
neuroligins being heterophilic cell adhesion proteins 
[69]. We suggest that the ChEL domain is involved in 
thyroglobulin transport (thyrocyte to apical membrane) 
and dimerization, with a possible additional function in 
cell adhesion. Phylogenetic studies of esterase domains 
from less evolved species have indicated that the thyro- 
globulin ChEL and esterase domains have a common an- 
cestor [26]. Additional file 2: Tables S2, Additional file 3: 
Table S3, Additional file 4: Table S4 and Additional file 
5:Table S5 show the pairwise distances between whole 
thyroglobulin protein sequences and each region of 
thyroglobulin. The ChEL domain is the region of thyro- 
globulin for which distances were lowest between differ- 
ent species. This suggests that the thyroglobulin ChEL 
domain may have been less subject to rearrangement 
during evolution than the other domains. 

Existence of other thyroid hormone synthesis sites 

We show here that not all the thyroid hormone synthesis 
sites characterized to date are systematically present in all 
species with a thyroglobulin protein. The lack of some thy- 
roid hormone synthesis sites in some more highly evolved 
species ( 3 in zebra finch, zebrafish and fugu), the presence 
of only one site in the amphioxus protein and the total 
absence of thyroid hormone synthesis sites in the sea 
urchin protein may be explained by the relocation of these 
sites. Thyroid hormone synthesisrequires tyrosine residue 
iodination. The sea urchin protein has five tyrosine resi- 
dues (positions 3, 24, 31, 34 and 102) and at least one of 
these residues is a thyroid hormone synthesis site. The lack 
of sites in amphioxus, zebrafinch, zebrafish and fugu may 
be explained by an absence of need for large thyroid hor- 
mone production or the use of other tyrosine residues as 
thyroid hormone synthesis sites. 

We explored the function of thyroglobulin by phyl- 
ogeny; we compared the thyroglobulin regions of echino- 
derms and vertebrate species. Our results suggest that 
the Tgl region may have been the first to appear in the 
thyroglobulin protein. The Tgl regionwas also subject to 
the largest number of rearrangements during evolution. 
The Tg2, Tg3 and ChEL regions are present only in the 
thyroglobulin of vertebrates, suggesting a link between 
these regions and an adaptive function of thyroglobulin. 
The thyroglobulin protein seems to result from the as- 
sembly of the four regions. We found no precursor of 
thyroid hormones with only two or three of these regions 
in databases. We therefore suggest that the Tg2, Tg3 and 
ChEL regions appeared in thyroglobulin at the same 
time. These data support the hypothesis of potential add- 
itional functions of thyroglobulin in the cell, as an iodine 
reservoir, in cell-cell adhesion and in binding. As each 



thyroglobulin region may have a specific function in the 
protein, a mutation in one region may have conse- 
quences for the specific function of this region, resulting 
in a different pattern of phenotypic expression. 

Note 

A recent study raised the question of human DNA con- 
tamination in genomic databases [70], The first 5477 bp of 
chromosome 11 in zebrafish is 100% identical to human 
chromosome 4. We verified the zebrafish thyroglobulin 
located on chromosome 16 at position chrl6:33,835,318- 
33,852,335, and the human thyroglobulin located on 
chromosome 8 at position chr8:133,909,894-134,147,141. 

Additional files 



Additional file 1: Figure SI. The phylogenetic analysis of Tgl domains 
from the thyroglobulins of 13 species. The phylogenetic analysis of 
thyroglobulin Tgl domains from 13 species (human, marmoset, rat, mouse, 
panda, dog, horse, pig, cow, opossum, zebra finch, zebrafish and fugu) and 
the Tgl domains of thyroglobulin-like proteins from Ciona intestinalis, 
amphioxus and sea urchin. Evolutionary history was inferred by the 
neighbor-joining method. The bootstrap consensus tree inferred from 1000 
replicates is taken to represent the evolutionary history of the taxa 
analyzed. The numbers at nodes representing bootstrap scores. 

Additional file 2: Table SI. Estimation of evolutionary divergence 
between the thyroglobulin protein sequences of 13 species + the 
thryoglobulin-like sequences of Ciona intestinalis, amphioxus and sea 
urchin. The number of amino-acid substitutions per site between 
sequences is shown. Standard error estimates are shown above the 
diagonal and were obtained by a bootstrap procedure (10000 replicates). 
Analyses were conducted with the Jones-Taylor-Thornton matrix-based 
model. The rate variation between sites was modeled with a gamma 
distribution (shape parameter= 1). 

Additional file 3: Table S2. Estimation of evolutionary divergence 
between the Tgl region sequences of thyroglobulins from 13 species. The 
number of amino acid substitutions per site between sequences is shown. 
Standard error estimates are shown above the diagonal and were obtained 
by a bootstrap procedure (10000 replicates). Analyses were conducted with 
the Jones-Taylor-Thornton matrix-based model. The rate variation between 
sites was modeled with a gamma distribution (shape parameter = 1). 

Additional file 4: Table S3. Estimation of evolutionary divergence 
between the Tg3 region sequences of thyroglobulins from 13 species. The 
number of amino acid substitutions per site between sequences is shown. 
Standard error estimates are shown above the diagonal and were obtained 
by a bootstrap procedure (10000 replicates). Analyses were conducted with 
the Jones-Taylor-Thornton matrix-based model. The rate variation between 
sites was modeled with a gamma distribution (shape parameter = 1). 

Additional file 5: Table S4. Estimation of evolutionary divergence 
between the ChEL region sequences of thyroglobulins from 13 species. The 
number of amino acid substitutions per site between sequences is shown. 
Standard error estimates are shown above the diagonal and were obtained 
by a bootstrap procedure (10000 replicates). Analyses were conducted with 
the Jones-Taylor-Thornton matrix-based model. The rate variation between 
sites was modeled with a gamma distribution (shape parameter = 1). 

Additional file 6: Figure S2. ClustalX sequence alignment for 
thyroglobulins from 13 species. The ClustalX sequence alignment of 
thyroglobulins from 13 species (human, marmoset, rat, mouse, panda, dog, 
horse, pig, cow, opossum, zebra finch, zebrafish and fugu) and the 
amphioxus and sea urchin thyroglobulin-like proteins. Red: the four 
humanthyroid hormone synthesis sites; in green, the 10 human Tgl 
domains; in yellow, the human Tg2 region; in blue, the 5 human Tg3 
domains. 
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